Learning Chomsky-like Grammars for Biological Sequence Families

نویسندگان

  • Stephen Muggleton
  • Christopher H. Bryant
  • Ashwin Srinivasan
چکیده

This paper presents a new method of measur ing performance when positives are rare and investigates whether Chomsky like grammar representations are useful for learning accu rate comprehensible predictors of members of biological sequence families The positive only learning framework of the Inductive Logic Programming ILP system CProgol is used to generate a grammar for recognis ing a class of proteins known as human neu ropeptide precursors NPPs As far as these authors are aware this is both the rst bi ological grammar learnt using ILP and the rst real world scienti c application of the positive only learning framework of CPro gol Performance is measured using both predictive accuracy and a new cost func tion Relative Advantage RA The RA re sults show that searching for NPPs by using our best NPP predictor as a lter is more than times more e cient than randomly selecting proteins for synthesis and testing them for biological activity The highest RA was achieved by a model which includes grammar derived features This RA is sig ni cantly higher than the best RA achieved without the use of the grammar derived fea tures

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning Phonological Mappings by Learning Strictly Local Functions*

The current study identifies locality as a near-universal property of phonological input-output mappings that describe processes with local triggers and presents a learning algorithm which uses locality as an inductive principle to generalize such mappings from finite data. Input-output (or UR-SR) mappings like the one in (1) are integral to both the rewrite rules of Sound Pattern of English (C...

متن کامل

Generating all Circular Shifts by Context-Free Grammars in Chomsky Normal Form

Let {a1, a2, . . . , an} be an alphabet of n symbols and let Cn be the language of circular shifts of the word a1a2 · · · an; so Cn = {a1a2 · · · an−1an, a2a3 · · · ana1, . . . , ana1 · · · an−2an−1}. We discuss a few families of context-free grammars Gn (n ≥ 1) in Chomsky normal form such that Gn generates Cn. The grammars in these families are investigated with respect to their descriptional ...

متن کامل

Formal Properties of Categorial Grammars

We discuss two standard formal tools used to study models of grammar. One of these is formal language theory, which provides a way to describe the complexity of languages in terms of a sequence of standard language classes known as the Chomsky hierarchy. The other tool is learnability theory, which can describe, for a given class of languages, whether or not there exists a single learner that c...

متن کامل

Multidimensional trees and a Chomsky-Schützenberger-Weir representation theorem for simple context-free tree grammars

Weir [43] proved a Chomsky-Schützenberger-like representation theorem for the string languages of tree-adjoining grammars, where the Dyck language Dn in the Chomsky-Schützenberger characterization is replaced by the intersection D2n ∩ g(D2n), where g is a certain bijection on the alphabet consisting of 2n pairs of brackets. This paper presents a generalization of this theorem to the string lang...

متن کامل

Generating all permutations by context-free grammars in Chomsky normal form

Let Ln be the finite language of all n! strings that are permutations of n different symbols (n ≥ 1). We consider context-free grammars Gn in Chomsky normal form that generate Ln. In particular we study a few families {Gn}n≥1, satisfying L(Gn) = Ln for n ≥ 1, with respect to their descriptional complexity, i.e. we determine the number of nonterminal symbols and the number of production rules of...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000